Code
pak::pak(c('tidyverse','openxlsx','arrow','readxl'))Tony Duan
This document provides a comprehensive guide to reading data into R (input) and writing data out of R (output). We will cover a wide variety of common file formats, from traditional text files to modern, high-performance formats.

Comma-Separated Values (CSV) are one of the most common formats for storing tabular data. The readr package (part of the tidyverse) provides the read_csv() function, which is fast and efficient.
# Create a sample data frame to write to a CSV
sample_df <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  score = c(95, 82, 78, 91, 88)
)
write_csv(sample_df, "data/sample_data.csv")
# Read the CSV file back into R
data_from_csv <- read_csv("data/sample_data.csv")
head(data_from_csv)# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88For text files that use delimiters other than commas (like tabs or semicolons), you can use read_delim().
# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88The readxl package is excellent for reading .xls and .xlsx files. The openxlsx package can be used for both reading and writing.
# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88For large datasets, text-based formats are inefficient. Binary formats like Parquet and Feather are much faster to read and write and take up less disk space.
Parquet is a columnar storage format optimized for big data processing. The arrow package provides the interface to work with Parquet files.
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88Feather is a fast, lightweight binary format designed for sharing data between R and Python.
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88R has its own native formats for saving any R object.
.Rds: Saves a single R object..RData: Saves multiple R objects and can save the entire workspace.# Save multiple objects to one file
save(sample_df, mtcars, file = "data/multiple_objects.RData")
# Clear the objects from the environment to show they are loaded
rm(sample_df, mtcars)
# Load the objects back into the workspace
load("data/multiple_objects.RData")
# Check that they exist again
head(sample_df)# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88Connecting to databases is covered in detail in a separate guide (6 SQL database.qmd), but here is a quick example using SQLite.
library(DBI)
library(RSQLite)
# Create an in-memory SQLite database and write mtcars to it
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars_table", mtcars)
# Read the full table back into an R data frame
mtcars_from_db <- dbReadTable(con, "mtcars_table")
dbDisconnect(con)
head(mtcars_from_db)   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1Writing data is the reverse of reading. The functions are generally named write_*().
# Establish a connection
con <- dbConnect(RSQLite::SQLite(), "data/my_output_database.sqlite")
# Write the iris dataset to a new table named 'iris_table'
# `overwrite = TRUE` will replace the table if it already exists
dbWriteTable(con, "iris_table", iris, overwrite = TRUE)
# List tables to confirm it was created
print(dbListTables(con))[1] "iris_table"You can write a data frame to a new or existing Google Sheet. This requires authentication.
library(googlesheets4)
# Create a data frame to write
my_data_to_write <- data.frame(
  Name = c("Ada Lovelace", "Grace Hopper"),
  Contribution = c("First algorithm", "COBOL")
)
# Write the data to a new Google Sheet file named 'R-Sheet-Output'
ss <- write_sheet(my_data_to_write, ss = "R-Sheet-Output", sheet = "Sheet1")
# The returned object `ss` contains the URL of the new sheet
print(ss)---
title: "Input & Output in R"
author: "Tony Duan"
execute:
  warning: false
  error: false
format:
  html:
    toc: true
    toc-location: right
    code-fold: show
    code-tools: true
    number-sections: false
    code-block-bg: true
    code-block-border-left: "#31BAE9"
---
This document provides a comprehensive guide to reading data into R (input) and writing data out of R (output). We will cover a wide variety of common file formats, from traditional text files to modern, high-performance formats.
{width="500"}
```{r}
pak::pak(c('tidyverse','openxlsx','arrow','readxl'))
```
```{r}
# Load common packages for I/O operations
library(tidyverse)
library(openxlsx)
library(arrow)
library(readxl)
```
# 1. Input: Reading Data into R
## 1.1. Text Files (CSV, TXT)
### Reading CSV Files
Comma-Separated Values (CSV) are one of the most common formats for storing tabular data. The `readr` package (part of the `tidyverse`) provides the `read_csv()` function, which is fast and efficient.
```{r}
# Create a sample data frame to write to a CSV
sample_df <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  score = c(95, 82, 78, 91, 88)
)
write_csv(sample_df, "data/sample_data.csv")
# Read the CSV file back into R
data_from_csv <- read_csv("data/sample_data.csv")
head(data_from_csv)
```
### Reading Delimited Files
For text files that use delimiters other than commas (like tabs or semicolons), you can use `read_delim()`.
```{r}
# Write a tab-delimited file
write_delim(sample_df, "data/sample_data.txt", delim = "\t")
# Read the tab-delimited file
data_from_txt <- read_delim("data/sample_data.txt", delim = "\t")
head(data_from_txt)
```
## 1.2. Excel Files (.xlsx)
The `readxl` package is excellent for reading `.xls` and `.xlsx` files. The `openxlsx` package can be used for both reading and writing.
```{r}
# Write the sample data to an Excel file
write.xlsx(sample_df, "data/sample_data.xlsx")
# Read the data from the Excel file
data_from_excel <- read_excel("data/sample_data.xlsx")
head(data_from_excel)
```
## 1.3. High-Performance Formats (Parquet, Feather)
For large datasets, text-based formats are inefficient. Binary formats like Parquet and Feather are much faster to read and write and take up less disk space.
### Reading Parquet Files
Parquet is a columnar storage format optimized for big data processing. The `arrow` package provides the interface to work with Parquet files.
```{r}
# Write data to a Parquet file
write_parquet(sample_df, "data/sample_data.parquet")
# Read data from the Parquet file
data_from_parquet <- read_parquet("data/sample_data.parquet")
head(data_from_parquet)
```
### Reading Feather Files
Feather is a fast, lightweight binary format designed for sharing data between R and Python.
```{r}
# Write data to a Feather file
write_feather(sample_df, "data/sample_data.feather")
# Read data from the Feather file
data_from_feather <- read_feather("data/sample_data.feather")
head(data_from_feather)
```
## 1.4. R-Specific Formats (.Rds, .RData)
R has its own native formats for saving any R object.
-   `.Rds`: Saves a **single** R object.
-   `.RData`: Saves **multiple** R objects and can save the entire workspace.
### Reading an .Rds File
```{r}
# Save a single object (our data frame)
saveRDS(sample_df, "data/sample_df.rds")
# Read the object back
data_from_rds <- readRDS("data/sample_df.rds")
head(data_from_rds)
```
### Loading an .RData File
```{r}
# Save multiple objects to one file
save(sample_df, mtcars, file = "data/multiple_objects.RData")
# Clear the objects from the environment to show they are loaded
rm(sample_df, mtcars)
# Load the objects back into the workspace
load("data/multiple_objects.RData")
# Check that they exist again
head(sample_df)
```
## 1.5. Reading from a Database
Connecting to databases is covered in detail in a separate guide (`6 SQL database.qmd`), but here is a quick example using SQLite.
```{r}
library(DBI)
library(RSQLite)
# Create an in-memory SQLite database and write mtcars to it
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars_table", mtcars)
# Read the full table back into an R data frame
mtcars_from_db <- dbReadTable(con, "mtcars_table")
dbDisconnect(con)
head(mtcars_from_db)
```
## 1.6 Read .docx (Word) file
```{r}
library(officer)
# docx = read_docx("data/example03.docx")
# docx_summary(docx)
```
## 1.7 Read PDF Tables
```{r}
library(tabulapdf)
# out <- extract_tables('data/60870-By-the-Numbers.pdf')
# out[[1]]
```
# 2. Output: Writing Data from R
Writing data is the reverse of reading. The functions are generally named `write_*()`.
## 2.1. Write to CSV
```{r}
# The `write_csv` function from readr is a great choice
write_csv(mtcars, "data/mtcars_output.csv")
```
## 2.2. Write to Excel
```{r}
# The `write.xlsx` function from openxlsx is very convenient
write.xlsx(iris, "data/iris_output.xlsx")
```
## 2.3. Write to Parquet
```{r}
# Use the arrow package for high-performance writing
write_parquet(airquality, "data/airquality_output.parquet")
```
## 2.4. Write to Feather
```{r}
# Ideal for R-to-Python data exchange
write_feather(diamonds, "data/diamonds_output.feather")
```
## 2.5. Write to R-Specific Formats
```{r}
# Save a single model object
my_model <- lm(mpg ~ hp, data = mtcars)
saveRDS(my_model, "data/linear_model.rds")
# Save the entire current workspace
# save.image(file = "data/my_full_workspace.RData")
```
## 2.6. Write to a Database
```{r}
# Establish a connection
con <- dbConnect(RSQLite::SQLite(), "data/my_output_database.sqlite")
# Write the iris dataset to a new table named 'iris_table'
# `overwrite = TRUE` will replace the table if it already exists
dbWriteTable(con, "iris_table", iris, overwrite = TRUE)
# List tables to confirm it was created
print(dbListTables(con))
# Clean up
dbDisconnect(con)
```
## 2.7 Write to Google Sheets
You can write a data frame to a new or existing Google Sheet. This requires authentication.
```{r}
#| eval: false
library(googlesheets4)
# Create a data frame to write
my_data_to_write <- data.frame(
  Name = c("Ada Lovelace", "Grace Hopper"),
  Contribution = c("First algorithm", "COBOL")
)
# Write the data to a new Google Sheet file named 'R-Sheet-Output'
ss <- write_sheet(my_data_to_write, ss = "R-Sheet-Output", sheet = "Sheet1")
# The returned object `ss` contains the URL of the new sheet
print(ss)
```